SpamCop: A Spam Classi cation & Organization Program

نویسندگان

  • Patrick Pantel
  • Dekang Lin
چکیده

We present a simple, yet highly accurate, spam ltering program, called Spam-Cop, which is able to identify about 92% of the spams while misclassifying only about 1.16% of the nonspam e-mails. SpamCop treats an e-mail message as a multiset of words and employs a naive Bayes algorithm to determine whether or not a message is likely to be a spam. Compared with keyword-spotting rules, the probabilistic approach taken in SpamCop not only ooers high accuracy, but also overcomes the brittleness suuered by the keyword spotting approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SpamCop: A Spam Classification & Organization Program

We present a simple, yet highly accurate, spam filtering program, called SpamCop, which is able to identify about 92% of the spams while misclassifying only about 1.16% of the nonspam e-mails. SpamCop treats an e-mail message as a multiset of words and employs a na’fve Bayes algorithm to determine whether or not a message is likely to be a spam. Compared with keyword-spotting rules, the probabi...

متن کامل

Clustering Ensemble for Spam Filtering

One of the main problems that modern e-mail systems face is the management of the high degree of spam or junk mail they recieve. Those systems are expected to be able to distinguish between legitimate mail and spam; in order to present the nal user as much interesting information as possible. This study presents a novel hybrid intelligent system using both unsupervised and supervised learning t...

متن کامل

Impact of Feature Selection on Micro-Text Classification

Social media datasets – especially TwiŠer tweets – are popular in the €eld of text classi€cation. Tweets are a valuable source of microtext (sometimes referred to as “micro-blogs”), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others [6]. Tweets o‰en include keywords referred to as “Hashtags” that can be used as labels fo...

متن کامل

Estimating Posterior Probabilities In Classi...cation Problems With Neural Networks

Classi...cation problems are used to determine the group membership of multi-dimensional objects and are prevalent in every organization and discipline. Central to the classi...cation determination is the posterior probability. This paper introduces the theory and applications of the classi...cation problem, and of neural network classi...ers. Through controlled experiments with problems of kno...

متن کامل

Towards a fully automated protein structure classi cation : How to get CATH classi cation from FSSP Z - scores

Currently, each week about 50 new protein structures are made available in public databases. The attention is focused on developing automatic methods of classi cation. The work of organization is being done by several groups, to a large extent independently. To our knowledge, the consistency of di erent classi cations has never been examined on a protein by protein basis. Moreover, the potentia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998